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The Fisher-matrix formahsm is used routinely in the literature on gravitational-wave detection 
to characterize the parameter-estimation performance of gravitational-wave measurements, given 
parametrized models of the waveforms, and assuming detector noise of known colored Gaussian 
distribution. Unfortunately, the Fisher matrix can be a poor predictor of the amount of information 
obtained from typical observations, especially for waveforms with several parameters and relatively 
low expected signal-to-noise ratios (SNR), or for waveforms depending weakly on one or more 
parameters, when their priors are not taken into proper consideration. In this paper I discuss these 
pitfalls; show how they occur, even for relatively strong signals, with a commonly used template 
family for binary-inspiral waveforms; and describe practical recipes to recognize them and cope with 
them. 

Specifically, I answer the following questions: (i) What is the significance of (quasi-) singular 
Fisher matrices, and how must we deal with them? (ii) When is it necessary to take into account 
prior probability distributions for the source parameters? (iii) When is the signal-to-noise ratio 
high enough to believe the Fisher-matrix result? In addition, I provide general expressions for the 
higher-order, beyond-Fisher-matrix terms in the 1/SNR expansions for the expected parameter 
accuracies. 

PACS numbers: 04.80.Nn, 95.55.Ym, 02.50.Tt 

I. INTRODUCTION 

Over the last two decades, the prevaihng attitude in the gravitational-wave (GW) source-modeling community 
has been one of pre-data positioning: in the absence of confirmed detections, the emphasis has been on exploring 
which astrophysical systems, and which of their properties, would become accessible to GW observations with the 
sensitivities afforded by planned (or desired) future experiments, with the purpose of committing theoretical effort 
to the most promising sources, and of directing public advocacy to the most promising detectors. In this positioning 
and in this exploration, the expected accuracy of GW source parameters, as determined from the signals yet to be 
observed, is often employed as a proxy for the amount of physical information that could be gained from detection 
campaigns. However, predicting the parameter-estimation performance of future observations is a complex matter, 
even with the benefit of accurate theoretical descriptions of the expected waveforms and of faithful characterizations 
of the noise and response of detectors; in practice, the typical source modeler has had much less to go with. The 
main problem is that there are few analytical tools that can be applied generally to the problem, before resorting to 
relatively cumbersome numerical simulations that involve multiple explicit realizations of signal-plus-noise datasets. 

In the source-modeling community, the analytical tool of choice has been the Fisher information matrix Fij [h] = 
{hi,hj): here hi{t) is the partial derivative of the gravitational waveform h(t) of interest with respect to the z-th 
source parameter 9i, and "(•, •)" is a signal product weighted by the expected power spectral density of detector noise, 
as described in Sec. Ill B[ Now, it is usually claimed that the inverse Fisher matrix F^^[ho] represents the covariance 
matrix of parameter errors in the parameter-estimation problem for the true signal ho{t). This statement can be 
interpreted in three slightly different ways (all correct), which we examine in detail in Sec.|TTl and preview here: 

1. The inverse Fisher matrix F^^[ho] is a lower bound (generally known as the Cramer-Rao bound) for the error 
covariance of any unbiased estimator of the true source parameters. Thus, it is a frequentist error (see Sec. lII A"| : 
for any experiment characterized by the true signal ho{t) and a certain realization n(t) of detector noise, the 
parameter estimator ^ is a vector function of the total detector output s = u + Hq, and F,j^^[hQ] is a lower bound 

on the covariance (i.e., the fluctuations) of 9 in an imaginary infinite sequence of experiments with different 
realizations of noise. The Cramer-Rao bound is discussed in Sec. Ill CI 

2. The inverse Fisher matrix i^J^[/io] is the frequentist error covariance for the maximum-likelihood (ML) parameter 

estimator 9^^, assuming Gaussian noise, in the limit of strong signals (i.e., high signal-to- noise ratio SNR) or, 
equivalently, in the limit in which the waveforms can be considered as linear functions of source parameters. 
We shall refer to this limit as the linearized-signal approximation, or LSA. This well-known result is rederived 
in SecHm] 
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3. The inverse Fisher matrix F^- [ho] represents the covariance (i.e., the multidimensional spread around the 
mode) of the posterior probability distribution p{9q\s) for the true source parameters Oq, as inferred (in Bayesian 
fashion) from a single experiment with true signal Iiq, assuming Gaussian noise, in the high-SNR limit (or in the 
LSA), and in the case where any prior probabilities for the parameters are constant over the parameter range 
of interest. Properly speaking, the inverse Fisher matrix is a measure of uncertainty rather than error, since in 
any experiment the mode will be displaced from the true parameters by an unknown amount due to noise. ^ See 
Sec. Ill El for a rederivation of this result. 

As pointed out by Jaynes 1] , while the numerical identity of these three different error-like quantities has given rise to 
much confusion, it arises almost trivially from the fact that in a neighborhood of its maximum, the signal likelihood 
p{s\6o) is approximated by a normal probability distribution with covariance Fr^\ In this paper, I argue that the 
Cramer-Rao bound is seldom useful in the work of GW analysts fSec. Ill C|) . and while the high-SNR/LSA frequentist 
and Bayesian results are legitimate, they raise the question of whether the signals of interest are strong (or linear) 
enough to warrant the limit, and of what happens if they are not. In addition, if we possess significant information 
about the prior distributions (or even the allowed ranges) of source parameters, it is really only in the Bayesian 
framework that we can fold this information reliably into the Fisher result (Sec. Ill Pp . 

Thus, I recommend the Bayesian viewpoint as the most fruitful way of thinking about the Fisher-matrix result 
(although I will also derive parallel results from the frequentist viewpoint). Of course, the study of Bayesian inference 
for GW parameter-estimation problems need not stop at the leading-order (Fisher-matrix) expression for the posterior 
likelihood: Markov Chain Monte Carlo (MCMC) algorithms [1] can provide very reliable results, immune from any 
considerations about signal strength, but they require a significant investment of time to implement them, and of 
computational resources to run them, since they necessarily involve explicit realizations of the noise. More rigorous 
Bayesian bounds (such as the Weiss- Weinstein and Ziv-Zakai bounds examined by Nicholson and Vecchio [3]) can 
also be derived, but they require a careful appraisal of the nonlocal structure of the likelihood function. 

By contrast, the Fisher-matrix formalism is singularly economical, and it seems clear that it will always be the 
first recourse of the GW data analyst. To use it reliably, however, we must understand the limits of its applicability. 
The purpose of this paper is to explore these limits. I do so by providing practical solutions to three issues that were 
already raised in the seminal treatments of GW detection by Finn [4] and by Cutler and Flanagan 0| , but that seem 
to have been almost ignored after that: 

1. What is the significance of the singular or ill-conditioned Fisher matrices that often appear in estimation 
problems with several source parameters, and how do we deal with them? Can we still believe the Fisher result 
in those cases? (See Sec. IIVH 

2. When is it necessary to take into account the prior probability distributions for the parameters, even if specified 
trivially by their allowed ranges? (See Sec. El) 

3. When is the high-SNR/LSA approximation warranted? (As anticipated above, the high-SNR limit is equivalent 
to the LSA, as we shall show in Sees . HID] and UTeI ) That is, how strong a signal will we need to measure if we 
are to believe the Fisher-matrix result for its uncertainty? (See Sec. IVII ) 

Last, I discuss the extension of the LSA beyond the leading order, in both the frequentist and Bayesian parameter- 
estimation frameworks (Sec. IVII[) . in a form that the adventurous GW analyst can use to test the reliability of the 
Fisher result (but higher-order derivatives and many-indexed expressions start to mount rapidly, even at the next- 
to-leading order). By contrast, I do not address the reduction in parameter-estimation accuracy due to the presence 
of secondary maxima in the likelihood function, as noticed \d\ and carefully modeled by Balasubramanian and 
colleagues in their extensive Monte Carlo simulations of ML estimation for inspiraling binaries using Newtonian and 
first post-Newtonian waveforms. 

My treatment follows Refs. [1, @, as well as the classic texts on the statistical analysis of noisy data (e.g., Refs. 
[1,[3[3)- I am indebted to Jaynes and Bretthorst for their enlightening, if occasionally blunt, perspective on 

frequentist and Bayesian parameter estimation. The reader already familiar with the standing of the Fisher-matrix 
formalism in the frequentist and Bayesian frameworks can skip Sees. Ill Al (a refresher on the difference between the 
frequentist and Bayesian viewpoints) and III CI III El (a pedagogical derivation of the three approaches to the inverse- 
Fisher-matrix result that were introduced at the beginning of this section) , and move directly to discussion of the three 
issues in Sees. IIVHVII and to the higher-order formalism in Sec. IVIIi referring back to Sec.|lT]as needed to establish 



In the high-SNR/LSA Hmit with negUgible priors, the posterior probabihty mode, seen as a frequentist statistic, coincides with the ML 
estimator; thus its fluctuations are again described by the inverse Fisher matrix. 
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notation. Whenever my discussion requires a practical example, I consider signals from inspiraling binaries of two 
black holes, both of mass IOMq, as described by the restricted post-Newtonian approximation for adiabatic, circular 
insp irals (see Sec. in my examples, I assume detection and parameter estimation are performed on Initial-LIGO 
[12| data, and I adopt the LIGO noise curve of Table IV in Ref. [13]. Throughout, I use geometric units; I assume the 
Einstein summation convention for repeated indices; and I do not distinguish between covariant and contravariant 
indices, except in Sec. IVIII 

II. THREE ROADS TO THE FISHER MATRIX 

In this section I discuss the "three roads" to the inverse Fisher matrix as a measure of uncertainty for GW 
observations: the Cramer-Rao bound (Sec. Ill C|) . the high-SNR/LSA limit for the frequentist covariance of the ML 
estimator (Sec. IIID[) . and the high-SNR/LSA limit for the single-experiment covariance of the Bayesian posterior 
distribution (Sec. UTE)) . Sections \n~K\ and HTB] are refreshers about frequentist and Bayesian parameter estimation, 
and about the analytical expression for the likelihood of GW signals in Gaussian noise. 

A. A refresher on the frequentist and Bayesian frameworks 

The frequentist (or orthodox) approach to parameter estimation for GW signals can be summed up as follows: 

1. We are given the detector data s and we take it to consist of the true signal ho — h{9o) (where 6o is the vector 
of the true system parameters) plus additive noise n. 

2. We select a point estimator 9{s): that is, a vector function of detector data that (it is hoped) approximates the 
true values of source parameters, except for the statistical error due to the presence of noise. One important 
example of point estimator is the ML estimator 9^^, which maximizes the likelihood p{s\9) of observing the 
measured data s given a value 9 of the true parameters. For additive noise, this likelihood coincides with the 
probability of a noise realization n — s — h{9), and for Gaussian noise it is given below in Sec. Ill Bl 

3. We characterize statistical error as the fluctuations of 0(s), computed over a very long series of independent 
experiments where the source parameters are kept fixed, while detector noise n is sampled from its assumed 
probability distribution (often called the sampling distribution). 

The estimator 9 is usually chosen according to one or more criteria of optimality: for instance, unhiasedness requires 
that {9{s))n (the average of the estimator over the noise probability distribution) be equal to 9o. 
A rather different approach is that of Bayesian inference: 

1. We do not assume a true value of the system parameters, but we posit their prior probability distribution p{9). 

2. Given the data s, we do not compute estimators, but rather the full posterior probability distribution p(9\s), 
using Bayes' theorem p{9\s) = p{s\9) x p{9)/p{s), where p{s) = J p{s\9) p{9) d9 . 

3. We characterize statistical error in a single experiment by the spread of the posterior distribution p{9\s). 

The differences between the frequentist and Bayesian approaches are not only mathematical, but also epistemic: as 
their name indicates, "frequentists" view probabilities essentially as the relative frequencies of outcomes in repeated 
experiments, while "Bayesians" view them as subjective^ indices of certainty for alternative propositions. For an in- 
troduction to the contrasting views, I refer the reader to the excellent treatise (very partial to the Bayesian worldview) 
by Jaynes []| , and to Ref. @ for a more GW-detection-oriented discussion. 

Once actual detections are made, the Bayesian approach of computing posterior probability distributions for the 
signal parameters given the observed data seems more powerful than the frequentist usage of somewhat arbitrary 
point estimators; the latter will always result in throwing away useful information, unless the chosen estimators are 
sufficient statistics (i.e., unless the likelihood depends on the data only through the estimators). As for statistical 
error, it seems preferable to characterize it from the data we have (actually, from the posterior distributions that 



^ Only in the sense that subjects with different prior assumptions could come to different conclusions after seeing the same data; indeed, 
Bayesian statistics describes how prior assumptions become deterministically modified by the observation of data. 
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we infer from that data), rather than from the data we could have obtained (i.e., from the samphng distribution of 
estimators in a hypothetical ensemble of experiments). 

As Cutler and Flanagan point out, however, it is in the current pre-data regime that we seek to compute 
expected parameter accuracies; in the absence of actual confirmed-detection datasets, it seems acceptable to consider 
ensembles of possible parameter-estimation experiments, and to use frcquentist statistical error as an inverse measure 
of potential physical insight. The best solution, bridging the two approaches, would undoubtedly be to examine the 
frequentist distribution of some definite measure of Bayesian statistical error; unfortunately, such a hybrid study is 
generally unfeasible, given the considerable computational requirements of even single-dataset Bayesian analyses. 



B. Likelihood for GW signals in Gaussian noise 



Under the assumption of stationary and Gaussian detector noise, the likelihood logp{s\9) can be obtained very 
simply from a noise-weighted inner product of the detector output and of the signal h{9) (see for instance Eq. (2.3) 
in Ref. [5]): 

p(s|0) cx e-(^-''W'^-''(«»/2; (1) 

the weighting is performed with respect to the expected power spectral density of detector noise by defining the 
noise- weighted inner product of two real- valued signals as 



where h{f ) and g{f) are the Fourier transforms of h{t) and g{t), denotes complex conjugation, and Sn{f) is the 
one-sided power spectral density of the noise. From the definition of Sn{f) as (n*(/)ri(/'))„ = 55„(|/|) S{f — /'), we 
get the useful property 

((/i,n)(n,g))„ = (/i,5), (3) 
where again "(•)«" denotes averaging over the probability distribution of the noise. 



C. First road: Derivation and critique of the Cramer— Rao bound 

The derivation in this section is inspired by the treatment of Ref. [l], p. 518], and it is given for simplicity in the 
case of one source parameter. We wish to pose a bound on the frequentist estimator variance 

var^ = ((^(s)~(^(s)))')^: (4) 

to do this, we consider the ensemble product 

(u(s), w(s))„ = J u{s) v{s) p{s\eo) ds, (5) 

where p{s\9q) is the likelihood of observing the detector output s given the true source parameter 9o, or equivalently 
the likelihood of observing the noise realization n — s — Hq. Setting v{s) = 9{s) — (0(s))^, we obtain a bound on 
{v,v)n = vslt9 from the Schwarz inequality: 

var9={v,v)^>y^. (6) 

[U, U)n 

This inequality is true for any function u{s) of the data, and it becomes an equality when u{s) cx v{s). Since we wish 
to derive a bound that applies generally to all estimators, we should not have (or try) to provide too much detail 
about 9 (and therefore v{s)). A simple assumption to make on 9 is that it is an unbiased estimator: 



(7) 
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How does this help us? It turns out that we can write a function d(s) whose ensemble product with any other function 
w{s) yields the derivative dgg{'w{s))n] this function is just d{s) — dg„ \ogp{s\0Q), because 

w{s)[deologp{s\0o)]p{s\0o)ds ^ / w(s) 9eoP(s|^o) = 9e„ w{s)p{s\9o) ds = deo{w{s))n, (8) 



assuming of course"^ that we can exchange integration and differentiation with respect to ^o- For any s, d{s) encodes 
the local relative change in the likelihood function as 9o is changed. It follows that (d(s),u(s))„ — ^^^(^(s))^ = 1. so 
from Eq. ^ we get"* 

{d{s),d{s)) ~ {dg„ logp{s\eo),do„ logp(s|6'o))„ ' 

which is the unbiased-estimator version of the Cramer-Rao bound. If the estimator is biased, we can still use the 
Schwarz inequality by providing the derivative of the bias hiOo) with respect to 

{e{s)),, = eo + h{eo) => dgM.s))„ = i + dsA^o), (lo) 

and therefore 

^ 11 1 n0\'f^ M.^N ■ (11) 

{obo logp(s|6'o), deo logp(s|6'o))„ 
Generalizing to a multidimensional expression is straightforward, if verbose (see, e.g., Ref. p^): 

covar„(^„^0 > {S,^ + 5„&,(0o))i^™j (-5,/ + dM^^))^ (12) 
where the Fisher information matrix is defined by 

F,i - ((9aogp(s|0o)), {di\ogp{s%))) ^ = ~(dmogp{s%)) ^. (13) 

The second equality is established by taking the gradient of J((9i logp(s|0o))p(s|^o) c?*, and remembering that 
di J p{s\9o) ds = dil = 0. With the help of Eqs. ^ and ([3]), we can compute the Fisher matrix for GW signals 
in additive Gaussian noise, which is the familiar expression Fij = {dih, djh). 

The full expression for the Cramer-Rao bound, which includes the effects of bias, has interesting consequences, 
for it implies that biased estimators can actually outperform^ unbiased estimators, since the dmbi{do) can be negative. 
Unfortunately, we have no handle on these derivatives without explicitly choosing a particular estimator (which goes 
against the idea of having a generic bound), so the Cramer-Rao bound can only give us a definite result for the 
subclass of unbiased estimators. 

As pointed out by Cutler and Flanagan d, App. A 5], it follows that the bound cannot be used to place absolute 
limits on the accuracy of estimators (i.e., lower bounds on frequentist error) — limits that would exclude or severely 
limit the possibility of inferring the physical properties of sources from their emitted GWs. Even if the lower bound 
for unbiased estimators is very discouraging, there is always a chance that a biased estimator could do much better, 
so we cannot use the bound to prove "no go" theorems. 

Going back to Eq. ([6]), we note that the bound is satisfied as an equality when 

uis)<^vis) => dis) = deo\ogpis\eo)^qi0oms)- {0{s))n]- (14) 
By integrating, we obtain a relation between the likelihood and the estimator: 

p(^l^o) = |j|^e-'('^")^(^); (15) 



^ This assumption fails for some (mildly) pathological likelihood functions, which can provide counterexamples to the Cramer-Rao bound. 
* To obtain Eq. we need to notice also that for any w(s), {d{s), {w{s))„)n = 0, since {w{s))„ does not depend on s (but only on Bq), 

and the integral of Eq. ((HJl reduces to (ui(s))„ f d0gp(s\6o)ds = {w{s))„dggl = 0. 
^ This is true even if we evaluate the performance of estimators on the basis of their quadratic error 

m - do.m - eoi))n > b,{eo)bi(eo) + {5,^ + d„M<}o))F-^{5,i + dM^o)) 

rather than on the basis of their variance. 
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the estimation problems (i.e., the pairings of given Ukehhoods and chosen estimators) for which this relation holds 
true are said to belong to the exponential family, and these problems are the only ones for which the Cramer-Rao 
bound is satisfied exactly as an equality. Equation (fT5|) generalizes trivially to multidimensional problems by replacing 
the exponential with exp {—lk{Oo)6k{s)} . Unfortunately, for a given p(s\9q) there is no guarantee that any unbiased 
estimator exists that satisfies Eq. (|15p and that therefore can actually achieve the bound; all we can say in general 
about the performance of unbiased estimators is that they will underperform the Cramer-Rao bias, but we do not 
know how badly. As discussed above, the bound tells us nothing in general about biased estimators. 

It follows that the bound cannot be used to establish guaranteed levels of accuracy (i.e., upper bounds on frequentist 
error), which would prove the possibility of inferring the physical properties of sources from their GWs. We can only 
do so if we can identify a specific estimator that achieves the bound. In the next section we shall see that the ML 
estimator® does so in the high-SNR limit, where waveforms can be approximated accurately as linear functions of 
their parameters within the region of parameter space where p{s\6) is not negligible (so the high-SNR limit coincides 
with the limit in which the LSA is accurate). 

We conclude that the Cramer-Rao bound is seldom useful to the GW analyst as a proper bound, whether to make 
positive or negative expected-accuracy statements; where it is useful, it reduces to the high-SNR/LSA result for the 
ML estimator. 



D. Second road: Derivation and critique of the frequentist high-SNR/LSA result 

We denote the true signal as ho (so s = ho + n), and expand the generic waveform h(9) around /iq, normalizing 
signals by the optimal signal-to-noise ratio of the true signal, A = ^ (ho-, ^o) (also known in this context as signal 
strength): 



h{9) = ho + Okhk + 9jekhjk/2 + ■■■ = A{ho + 9kh + 9j9khjk/2 + ■■■); 



(16) 



here we are translating source parameters as needed to have h{0) — ho, defining hi = dih\e^o, hij — dijh\g^o (and so 
on), and ho = ho/A, h^ = h^/A (and so on).^ The likelihood is then given by* 



p{s\9) cx e-(^-''('')'^-''(»))/2 = exp{ - (n,n)/2 - A^[9,9k{hM) + Wd^Mi) + • • •]/2 

+ A[9jin, h,) + 9j9kin, h,k)/2 + 9,9k9i[n, V)/3! + ••■]}• 

The ML equations djp{s\9^^) — are given by 
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(n, hj) + 9k{n, h.jk) + 9k9i{n, hjki)/2 + 



9k{hj,hk) + 9k0i {{hj,hki)/2 + {hk, hji)) +■■■ 



(17) 



(18) 



where we have divided everything by A'^, and we omit the superscript for conciseness. A careful study of Eq. 

shows that it can be solved in perturbative fashion by writing 9^^ as a series in 1/A, 



f^/A + 9f 

and by collecting the terms of the same order in Eq. p^ . 



0(1 /A): in,h,)-9i^\h,,hk)^0 



0(1/ A^): 9^^\n,h,k) 
0{l/A^) : ... 



{{h,~hki)/2+{hk,h,i))-9f\h,~hk)=Q 



(19) 



(20) 



^ Indeed, Eq. JTS} implies that if both an efficient (i.e., bound-achieving) unbiased estimator and the ML estimator exist, they must 
coincide. To show this, we notice that if the ML estimator exists, the log-derivative di \ogp{s\d) = —dilk{S){Ok ^ ^'fe) must be zero at 
e = e^^, from which it follows that §k = 0^^. 

The statistical uncertainty in the estimated signal strength can still be handled in this notation by taking one of the /ij. to lie along Kq; 
the corresponding represents a fractional correction to the true A. 
* Formally, it is troubling to truncate the series expression for the exponent at any order beyond quadratic, since the integral of the 
truncated likelihood may become infinite; the important thing to keep in mind, however, is that the series need converge only within 
a limited parameter range determined self-consistently by the truncated-likelihood estimator, by compact parameter ranges, or (in the 
Bayesian case) by parameter priors. Similar considerations apply to the derivation of the higher-order corrections given in Sec. IVIll 
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thus the ML solution is given by 

{hj,hiy'^{{n,hik)(hk,hiy^{hi,n) - [{h^,hki)/2 + (hk,hu)) {hk,hrn)~^ {hrn,n){hi,hn)~^ {hn,n)] + (21) 

Thus we see that the hmit of large A (i.e., high SNR) coincides with the linearized- signal approximation (LSA) 
where only the first derivatives of the signals are included. In the LSA, the likelihood is just 

p{s\e) cx exp{ - (n, n)/2 - OjOk {hj ,hk)/2 + 9, {h, , n) } 

^eyiYy{-[n,n)/2- A^9j9k{hj,hk)/'2. + A9j{hj,n)] (LSA), 

and the ML estimator is given by 

9f^ = {\/A){h,Mr\hk,n) (LSA). (23) 

Since {{hk, n))n = 0, we see also that the ML estimator is unbiased. The variance of 9^^ is then obtained by averaging 
9Y^9]f^ over noise realizations, 

\ 1 (24) 

^ ^{hj,hi)-\hi,Kn){K,,hkr^ ^ — {h,~hk)-' (LSA), 

and it coincides with the mean quadratic error in the frequentist sense. In Eq. (|24|) . the second equality follows from 
Eq. ([3]). The interpretation of the limit is that, for strong signals, the typical 9^^ — 9Qj becomes small enough that 
the log-likelihood is accurately described by the product of detector data and a linearized signal. 

Equation ((24|l is the standard Fisher-information-matrix result, and it implies that in the high-SNR/LSA limit the 
ML estimator achieves the Cramer-Rao bound. As we shall see in Sec. IVIIi the next-order correction to the variance 
scales as 1/A*, not 1/A^. This is because all 0{1/A^) terms contain odd numbers of n, whose products vanish under 
the ensemble average. The fact itself that there is a next-order correction shows that for generic A the ML estimator 
does not achieve the bound. 

The fact that the Cramer-Rao bound is achieved in the high-SNR/LSA limit, but not beyond it, can also be seen 
in the light of Eq. which encodes a standard form for the estimation problems in the exponential family. To 

express the LSA likelihood in this form, we can set m(s) = g-(s-'io:S-'io)/2 g^j^j Z{9) = g0jek{hj,hk)/2. remains to 
establish that 

^l,{9)9f'^{s)^9,{h,,s~ho), (25) 

which is satisfied by lj(9) — —{hj, hk)9k [see Eq. ((23|) ]. Now, if additional terms are added to Eq. (|22p. beginning with 
terms cubic in the 9i, 9Y^{s) comes to be a nonlinear function of the signal, such that no ~lj{9) can multiply it in the 
right way to reconstruct the likelihood. It then follows that the estimation problem moves outside the exponential 
family, and the Cramer-Rao bound cannot be achieved. 

It is possible (but perhaps not desirable, as we shall see shortly) to modify the ML estimator to take into account 
prior knowledge about the expected distribution of sources. The resulting maximum-posterior estimator 9^^ is defined 
as the mode of the posterior probability p{9\s) = p{s\9)p{9)/p{s), 

9^^ = maxloC(,p(6l|s) = maxlocep(s|6l)p(6'). (26) 

This is a biased estimator: in the high-SNR/LSA limit, and with a Gaussian prior p(0) cx exp{—Pij{9i—9f)(9j—9j')/2} 
centered at 9^ (the only prior that can be easily handled analytically), we find 

^MP ^ ^^MPy^ ^ [(/-^^ ^ p^^./^2] -1 ^p^^/A^) 9l (LSA/Gaussian prior); (27) 

thus the 9^^ becomes unbiased for A ^ cx) (indeed, in that limit 9^^ tends to 9^^). For the frequentist variance we 
find 

/^MP^MPV _/^MPX /^MPX ^/^MP^MPX _ ^MP^MP 
\2 3 In \* / n\ 3 In \ ^ 3 In * 3 

1 _i (28) 

= {{K, hk) + P^k|A^] (hk, hi) [ihi,hj) + Pij/A^] (LSA w/prior), 
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which coincides^ with the generaHzed Cram er-Rao bound of Eq. , proving that the estimation problem defined 
by the LSA hkehhood and belongs to the exponential family. 

The reason why is not too useful to characterize future parameter-estimation performance is that we expect a 
reasonable measure of error to converge to the effective width of the prior in the limit of vanishing signal strength. 
Instead, in the absence of any information from the experiment, becomes stuck at the mode of the prior, and its 
variance [in Eq. (j28p ] tends to zero. This behavior occurs for any nonuniform prior. 



E. Third-road Derivation of the Bayesian high-SNR/LSA result 



We now wish to show that in any single experiment, if the high-SNR/LSA limit is warranted (and if the parameter 
priors are uniform over the parameter region of interest), the inverse Fisher-information matrix yields the variance 
of the Bayesian posterior probability distribution. To do so, we rewrite Eq. (|17p in terms of normalized parameters 
9i = A9v. 

p{s\9) cx exp{-(n, n)/2 + [(n, 1^)9, + ^(n, h,k)9j0k/2 + -^{n, hjki)Wi/^^- + 0(1/^3) 
We can build the variance from the posterior mean 



9,p{s\9)d9 / / p{s\9)d9 



(30) 



and the quadratic moment 



{9,9,)^ = / 9,9,p{s\9) d9 / / p{s\9) d9 



(31) 



where denotes integration overp(s|6'). The idea is to proceed in perturbative fashion, writing the moments as 

series in e = \/A: taking (^i)p as an example, 



(n) 



\(") 



e=0 



(32) 



Since e appears at both the numerator and denominator of Eq. pop . we write 



{^^)p 



J piO) d9 + ej9.^d9 + ^J h ^d9 



Jpi0)d9 + ej'-fld9 + ^j'-^d9 + --- 
(where the argument of p implies that the (n)-th derivative is evaluated at e = 0), and therefore 



(33) 



(0) 



:,p{o)d0 / / p{o)de 



(0^ - 



dpio) 



(0) f dp{0) 



de 



d9 



p{0)d9- 



(34) 



Note that the Fisher matrix that must be substituted into Eq. II12I I is still ~{djdf,p{s\9))n = {hj,hfS), and not —{djdk\p(s\d)p{9)\)n = 
+ Pjk- The prior distribution does not concern the Cramer-Rao bound, which is computed from the likelihood alone for a fixed 
known value of the true source parameters. Instead, we happen to be using an estimator that takes into in account prior information, 
which enters into the Cramer— Rao bound via the derivative of the bias. 

For uniform priors (e.g., rectangular distributions corresponding to the allowed parameter ranges), actually becomes undefined in 

the Q limit. 
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similar expressions hold for {0i9j)p, and a general expression for the (n)-th-order contribution is given in Sec. IVIIBI 
The e ^ limit coincides with the limit of large signal strengths, or of vanishing derivatives higher than the first, 
since in that case Eq. ([29|) truncates to Eq. ([22| . In this limit, 

(^"')p = = (^'' ^^^-'(n, (LSA) (35) 

and 

(AdAS,)^ = {{9, - - ))f = (LSA), (36) 

and therefore 

{^^0,)^ = ^iKh,)-' ^ih^,h,)-' (LSA), (37) 

as can be seen by rewriting the exponential of Eq. as 

p{s\9) cc e^p{-ih, hj){9, - {9,)p){9j - (^,)p)/2}, (38) 

where we have omitted factors independent from 9 that cancel out in the normalization of p(s\9). 

Reinstating A in Eq. ([55]) we see that in the high-SNR/LSA limit the mean of the posterior distribution coincides 
with the ML estimator, as is reasonable, since the average of a normal distribution coincides with its mode. The two 
however differ when higher-order terms are included, as we shall see in Sec. IVIII From Eq. we see also that, to 
leading order, the variance of the posterior distribution is experiment-independent, and it coincides with the variance 
of the ML estimator (remember however that the two have very different interpretations^^). 

With the addition of a Gaussian prior p(0) cx g-Pa^i^j/^ centered at 9 = 0, Eqs. ([35]) and ([36]) change only slightly:^^ 

{9,)^[{h,h,) + Pr,/AY\n,h,) _ , 
, _ _ , _ _ ^ , (LSA/Gaussian prior). (39) 

{A9iA9,)^^[{h,,h,) + P,,/A']-' 



Note that p{9) oc e^^^/'^^^^'^^'^j/^ is formally an 0{1/A^) contribution to the likelihood exponential that would enter 
the 1/A expansion beginning at that order. However, if Pij is large enough to matter at the signal strengths of 
interest, it probably makes sense to bundle it with the zeroth-order terms as we did here. In contrast with Eq. ([28]) 
for the frequentist variance of 9^^ , we see that in the limit of vanishing signal strength the variance of the posterior 
goes to the variance Pij of the prior. 



III. STANDARD COMPACT-BINARY SIGNAL MODEL 



Throughout the rest of this paper, our fiducial model for compact-binary signals will be simple stationary-phase- 
approximated (SPA) waveforms including phasing terms from the spin-orbit and spin-spin interactions of parallel 
or antiparallel component spins. Parameter estimation with these waveforms was studied by Poisson and Will 
In this paper we adopt second-order post-Newtonian^'^ (2PN) Fourier-domain waveforms as written by Arun and 



If we define the quadratic error of the posterior distribution as {diBj)^ (which is appropriate given that the true signal is at S = 0), 

we must increment (hi,hj)~^ by the experiment-dependent quantity {Si){9j) = (hi,hi)~^ {n,hi){hra,n){hrn,hj)~^ ■ Interestingly, the 

frequentist average of the Bayesian error ((^i^j)p)„ is 'i(hi,hj)~^ , twice the frequentist variance of ifi^^ . 

With the Gaussian prior, the quadratic error becomes {(hi, hi) + Pii/A^]~^(hi, hm){{hm,, hj) + Pmj I ^X'^ ■ 

Waveform phasing expressions accurate to 3.5PN order are also provided in Ref. |l5(l . We do not use these in this article for the sake 

of simplicity, since they would not change the qualitative picture of parameter estimation presented here. For the reader's reference, 

however, the higher-than-2PN corrections to Eq. I I41I I. including the errata to Ref. [Tsll . are 



/38645 65 -.s 

r rf \o^v 4- 

V 252 9 ' 



11583231236531 640 , 6848 6848 , , A / 15335597827 225, 

-K^ 7 log(4) + h 

4694215680 3 21 21 / V 3048192 12 



1760 „ 12320 \ 76055 2 
-e H —A 1) + -——V - 



3 9 y 1728 1296 



6848 R , / 77096675 378515 . „ „ . 

V losv + ttI 1 n ri^ 

21 V 254016 1512 756 



where 7 = 0.57721 ■■ ■ is Euler's constant, and A = —1987/3080 and 8 = —11831/9240 are recently determined constants in the PN 
expansion fl^ . 
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colleagues [ISf : 



h{Mc, 7], f3,a, (po, to; 



(40) 



with 



128 77^5 



^°^g^ + ^^)«^ + (4/3-16.).^ + 10 



3058673 5429 



1016064 1008 



617 
144^ 



(41) 

where v — {ttAI f)^'^ , AI = mi + r7i2 is the total mass of the binary, rj = 17111712 / M'^ is the symmetric mass ratio, 
Mc = Mrfl^ is the chirp mass. The spin-orbit parameter (3 and the spin-spin parameter a 

are given by 



L ■ S, 



113 



Km) 



7577 



113L ■ [Si + S2] + 75L ■ [(ma/mQSi + (mi/m2)S2] 
12 M2 



(42) 



and 



721(L • Si)(L • S2) - 247 Si • S2 

48 7711 777,2 M"^ 



(43) 



with Si and S2 the spins of the binary components. We truncate waveforms at the (Keplerian) last stable circular 
orbit {v = 1/V^). 

For simplicity, in this article we do not discuss the estimation of the amplitude parameter A that would multiply 
the right-hand side of Eq. ([M)) . [From Eqs. ^ and it follows that {dAh,dih) — for i 7^ ^, so the amplitude 
A effectively decouples from all other parameters in the Fisher matrix.] However, all discussions to follow can 
accommodate the addition of A with trivial modifications. 



IV. THE SINGULAR CASE OF THE DISAPPEARING PARAMETER 



In Sec. |TT] we have examined the interpretation of the inverse Fisher matrix as a frequentist or Bayesian measure 
of error or uncertainty. In this section, we discuss what happens when the Fisher is matrix singular, or almost so, 
so that the attempts to invert it numerically yield warnings that it is badly conditioned. It is pedagogical to begin 
this discussion by considering the case where the matrix is exactly singular fSec. IIV"S| . and then to widen our scope 
to approximate singularity (Sec. |IVB|. The conclusion is that a singular Fisher matrix is almost always a symptom 
that the high-SNR/LSA limit is not to be trusted, that prior probabilities play an important role, or both. 



A. Singular Fisher matrix 

A singular Fisher matrix implies that the corresponding LSA likelihood (j22[) is a singular normal distribution [isj , 
which is constant along the directions of the Fisher- matrix eigenvectors with null eigenvalues^^ (henceforth, somewhat 
improperly, we shall call these null eigenvectors) , so the ML equation has no solutions, and the even moments of the 
distribution are infinite, even for parameters that do not appear in the null eigenvectors. Thus, the frequentist variance 
of the ML estimator and the Bayesian variance of the posterior distribution are (formally) infinite for all parameters. 

How to deal with this? If the signal is really linear, so that the LSA expressions are exact, it is possible to discard the 
combinations of parameters that correspond to the null eigenvectors, and characterize the variance of the remaining 
parameters. Let us see how, in the frequentist and Bayesian frameworks. In what follows, we denote the total number 
of source parameters by N , and the number of non-null eigenvectors by R. 



A reasonable objection to computing the eigensystem of the Fisher matrix is that it leads to taking linear combinations of parameters 
that may have different units. It is possible to avoid this problem by looking at the Fisher matrix more abstractly as a linear operator, 
and talking of its range and null space [19( : or more pragmatically, by dividing all parameters by their typical range; or perhaps by 
taking their logarithm (since we are working with errors, units can be forgotten as additive constants), which in the linearized theory 
is equivalent to dividing by the true parameters. We are going to largely ignore this issue, treating the parameters as pure numbers 
resulting from adopting a God-given system of units. 



11 



In the frequentist ML framework, we write {hi,hj) in the singular- value (SV) decomposition-'^^ as 
J^xw^o Q'^^^^^^o'"!^'^ (with (fc) = 1, . . . , i?), or BSO"^ in matrix notation (with G an iV x i? matrix with orthonormal 
columns, and S a diagonal matrix formed from the R non-zero eigenvalues). We can then refactor the ML equation 
as 

(eSe^)0 = A-in (9^0) = A-i(l]-ie^n) ^ c(^) = A-i(AW)-inW, (44) 

where c*^*^^ and n^*^^ denote the coefficients of the decompositions of 6 and {hi^n) with respect to the normalized 
non-null eigenvectors of (hi,hj). Since the ensemble average (n^'^^n^'^),; is just A'^'"'^;^*^'^^''^ (where S is Kronecker's 
delta), the frequentist covariance of the ML estimators c*-*^^ is the diagonal matrix A~^(A^'^'')~^(5'*'^''). 

In the Bayesian framework, the quantities of interest are the moments of the c^'^^ over infinite ranges of the c''^-' 
and of the coefficients C'^^' (with {K) — 1, . . . ,N ~ R) corresponding to the null eigenvectors, which are not included 
in the SV decomposition. Formally, these moments are ratios of two infinities, because the LSA likelihood is not a 
function of the C [not even through the n'^^-' = 9^^^^ {hi, n) terms, which are zero since [of^^hi, d^f^^hj) = 0], but they 
may be evaluated as improper integrals, in the limit of the ranges for the c^^^ extending to infinity: 

^ Jp{s\c)dcdC ' Acl'^- / ^^Ar^m , _A „ -A [A ) d . (45) 

We can then work back to the frequentist components of the covariance matrix (or the Bayesian posterior moments) 
that involve any 0i that do not appear in the null eigenvectors. All other 9i, however, are completely indeterminate.^^ 
In the frequentist framework, it may be possible to work back to interval estimates of their values by combining a 
ML estimate of the c''^-' with finite allowed ranges for some of the 9i] however, this would constitute a form of prior 
distribution for the 9i , which is not entirely compatible with the ML estimator (what happens if the solution of the ML 
equation falls outside the allowed range?). In the Bayesian framework, salvation may come from the prior probability 
distributions that make the posterior integrable.^^ Unless the priors are also normal, though, the resulting moments 
cannot be expressed simply as analytical expressions of the Fisher matrix. 

The most benign outcome occurs when the null eigenvectors correspond individually to one or more of the original 
parameters, or when the subspace spanned by null eigenvectors corresponds to a subset of the original parameters. 
The null-eigenvector combinations of parameters may also have clear physical interpretations: for instance, for a 
monochromatic, continuous sinusoid of frequency /, the absolute time offset to and the initial phase 4>o are essentially 
the same parameter, so the Fisher matrix has a null eigenvector along the parameter combination fto — 0o, which 
can be discarded, while ftg -\- 0o remains well determined. A similar case is the degeneracy between luminosity 
distance and a certain function of the sky-position angles in the analysis of short GW chirps with a single ground- 
based detector. Other combinations of parameters can be more ambiguous and troubling — what is the meaning of 
estimating a parameter equal to a mass plus a spin? In those cases, our best hope is again that the degeneracy will 
be cured by prior probabilities, or by higher-order corrections in the 1/A expansion, in which cases the Fisher-matrix 
formalism is certainly insufficient. 





1 JcWc«p(s|c)dc 




1 / p{s\c) dc 



B. Ill-conditioned Fisher matrix 



All nonsingular matrices have well-defined inverses, although these may be difficult to compute. The notion of ill 
conditioning from the theory of linear systems of equations [20| can be invoked here to provide a bound (valid under 



^ For square matrices, the SV decomposition is essentially equivalent to an eigenvector-diagonal-matrix decomposition where we drop the 

rows and columns corresponding to the null eigenvalues and eigenvectors. 
^ In a truly linear system, this is true no matter how small the eigenvector component in that parameter direction; clearly, this raises a 

problem of accuracy in the numerical computation of eigenvectors. 
''' Even a single prior in the form of a rectangle function will regularize the integration over all the null-eigenvector coordinates that include 

that parameter. For normal priors, whether the posterior becomes integrablc depends on the eigenstructure of A^{hi, hj) + Pij. 
^ Although neither of these examples is a linear model described exactly by the LSA, the degeneracy persists in the exact likelihood, 

so its Fisher-matrix diagnosis is correct. For such "perfect" degeneracies to occur, the two parameters must appear in all waveform 

expressions only as a sum or product; this would imply that their units can be sensibly summed, or that their combination has direct 

physical meaning. 
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reasonable conditions) on the perturbation of the inverse of a perturbed matrix, 

"'"i;:.'-';r""" ^-'^^'S^ 

here "|| • ||" is a matrix norm (e.g., the 2-norm ||M||2 = sup^ | |Mx| I2/I |x| I2 derived from the vector 2-norm ||x||2 = 
(Si )"^''^)' ^^'^ K,{M) = \ \M\ \ ||M~^|| is the condition number. Since ||M||2 is equal to M's largest eigenvalue, K2{M) 
is given by the ratio of its largest- to smallest-modulus eigenvalues. From a numerical-analysis perspective, as Finn 
Q points out, the gist of Eq. (|46p is that, roughly speaking, matrix inversion can amplify roundoff error by a factor 
K, leading to the loss of up to logj^g k digits of precision. The same amplification will apply to any inaccuracies in 
our knowledge of M. Taken at face value, this means that the Fisher-matrix results of Eqs. ([24|) and ([36|) may be 
inaccurate at a 100% level if the components of the Fisher matrix are not known to a fractional accuracy better than 

Of course, Eq. (|46| is only an upper bound, and this doomsday scenario needs not be realized in practice. One way 
to check whether the matrix-inversion sensitivity is a concern is to add small random perturbations, Monte Carlo- 
style, to the Fisher matrix elements, and then verify the change in the covariance matrix. Such an experiment for our 
standard SPA model (with mi = TO2 = lOAf0 and no spins) shows that perturbing the 12th significant digits of the 
Fij components is already enough to engender 100% changes in the diagonal elements of {F~^)ij (i.e., the predicted 
parameter variances). This behavior is ~ 100 times less severe than what is predicted by Eq. (|46|) . but it still tells a 
rather cautionary tale about numerical sensitivity in the inversion of that particular Fisher matrix. These problems 
can be cured, somewhat trivially, by adopting higher-precision arithmetics, and by computing the Fisher matrix to 
better accuracy. It may also be possible to improve the condition number by changing the units of source parameters, 
which may reduce the magnitude gap between the largest- and smallest-modulus eigenvalues. 

More to the point, it is the consequences of the Fisher-matrix condition number on the substance (rather than the 
numerical accuracy) of Eqs. ((23l) and pS]) that should attract our attention. A large condition number implies one or 
more small Fisher-matrix eigenvalues, and consequently large statistical fluctuations for the combinations of source 
parameters corresponding to the small-eigenvalue eigenvectors, at least according to the LSA. The interpretation is 
that large parameter changes in the direction of the small-eigenvalue eigenvectors are needed to produce changes in 
the waveform comparable to typical noise fluctuations. Under this condition, we have to worry whether the LSA 
can really describe the likelihood over the entire parameter ranges of interest: of course, these depend on the SNR 
available at detection (at leading order, their extent is inversely proportional to signal strength). In Sec. IVII we 
describe a numerical criterion to decide when the SNR is high enough to believe the LSA. We also have to worry 
whether prior probability distributions for the parameters (perhaps in the simple form of allowed ranges) already 
restrict the estimated (for frequentists) or probable (for Bayesians) values of parameters beyond what is predicted by 
the Fisher-matrix variance. In the next section we discuss a simple test to decide whether priors should be included. 



V. THE BURDEN OF PRIOR COMMITMENTS 

As Cutler and Flanagan point out p. 2691], "it is not necessary for a priori information to be very detailed or 
restrictive in order that it have a significant effect on parameter-extraction accuracy. All that is necessary is that it 
be more restrictive than the information contained in the waveform, for some of the parameters [. . . ] what is more 
surprising is that due to the effects of correlations, the rms errors obtained for the other parameters may also be 
overestimated by large factors." Roughly speaking, this happens because as we move in parameter space, the change 
in the signal can be partially absorbed by changing correlated parameters together; thus, limiting the range available 
to one parameter also limits the range over which a correlated parameter can run while not significantly modifying 
the signal. In this section we seek a practical recipe to determine, in the context of a parameter-estimation problem 
specified by a family of waveforms and a fiducial SNR, whether it is necessary to take priors into consideration when 
evaluating projected parameter accuracies. 

Since prior probabilities can only be discussed consistently in the framework of Bayesian parameter estimation, in 
this section we will restrict ourselves to that context. The Gaussian priors examined at the end of Sec. Ill El are rarely 
appropriate in actual practice, but they do provide a quick test to see if the prior-less Fisher result can be taken as 
it stands, or whether a more careful analysis is needed that includes the effects of priors. In Sec. IV Al wc try out this 



Augmenting the Fisher matrix with normal priors for rj, /3, and a, as described in Sec. IV Al can somewhat cure this instability to inversion, 
although the result is SNR-dependent: for SNR = 10, the errors in [Fij + Pij)~^ become intolerable for fractional perturbations in Fij 
of order 10~^ rather than 10~^^, but in the high-SNR limit the threshold reverts to the latter. 
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t\MalMa i\r] /^P Act A(^o Ato (ms) 



4pp, no priors 


2.9 


X 


10' 


-2 


8.3 


X 


10~^ 






7.3 






3.0 


4pp, NTC prior on -q 


2.7 


X 


10" 


-2 


7.9 


X 


10"^ 






7.0 






2.8 


5pp, no priors 


Li 








5.1 


X 


IQi 


1.2 X 10^ 




3.3 


X 


10^ 


6.9 


5pp, NTC prior on P 


3.0 


X 


10" 


-2 


3.8 


X 


IQ-i 


8.5 




7.7 






3.0 


5pp, NTC prior on 77, /3 


3.0 


X 


10" 


-2 


2.1 


X 


10-1 


4.9 




7.7 






3.0 


6pp, NTC priors on 77, /3 


4.3 


X 


10" 


-2 


2.5 


X 


10""^ 


8.4 


2.5 X 10^ 


4.3 


X 


10^ 


5.3 


6pp, NTC priors on 77, /3, a 


3.0 


X 


10" 


-2 


2.1 


X 


10-1 


5.1 


4.9 


1.1 


X 


lOi 


3.1 


4pp, exact priors on 77 


1.8 


X 


10" 


-2 


5.0 


X 


10-^ 






4.4 






1.9 


5pp, exact priors on rj, [3 


2.9 


X 


10" 


-2 


7.1 


X 


10-^ 


2.4 




7.5 






2.9 


6pp, exact priors on 77, /?, a 


2.9 


X 


10" 


'2 


7.1 


X 


10-^ 


2.6 


2.9 


9.0 






3.0 



TABLE I: Fisher-matrix rms errors in the 4-, 5-, and 6-parameter-estimation problems for a (10 + 10)Mq binary with /3 = cr = 
and SNR = 10, evaluated under different combinations of normal true-parameter-centered (NTC) priors (upper section of table) 
and of the exact priors of Sec. IV Bl (lower section) . The underlined errors are larger than the physical range for the parameter. 



quick test on the SPA model of Sec. IIIII For simplicity, we shall consider the effects of priors as logically independent 
from the sufficiency of the LSA, although the two problems clearly come into play together in real situations. 

A. Testing for the influence of priors (normal true-parameter— centered priors) 

We shall discuss our quick test by way of an example. The standard SPA model of Sec. IIIII has six parameters: 
Mc, 77, /3, CT, 0oi a-iid (plus A, which we disregard). We work at 2PN with SNR = A = 10, with true parameters 
mi = 7712 = lOM© (corresponding to Mc = 8. TIM©, 77 = 0.25), and f3 = a = (po = to = 0. We wish to examine 
the effect of priors for three related parameter-estimation problems involving different subsets of parameters: a 4- 
parameter problem (4pp) where we disregard spin parameters (i.e., where we assume we know a priori that the true 
binary has no spin); a 5pp where spin-orbit coupling [as represented by f3 in Eq. ()4ip ] is important, but spin-spin 
interactions can be neglected; and a 6pp where we include also spin-spin interactions [as represented by a in Eq. 
(I4ip ]. As we shall see, priors become increasingly important as the number of parameters increases. 

In each problem, we compute the expected covariance matrix of the posterior distribution as the inverse of (a 
submatrix of) the Fisher matrix, neglecting any non-LSA effects. We represent priors as normal distributions centered 
around null parameter displacements (i.e., the true parameter value), with standard deviations of 0.25 for 77 and, 
following Poisson and Will 14] 8.5 for f3 and 5 for a (in Ref. [2l|, Berti and colleagues derive and adopt approximate 
priors for (3 and a with standard deviations A/3 — 9.4 and Act = 2.5). This representation is very crude, but it is the 
only one that leads to a simple analytical result [Eq. ([55)1 ] for the posterior covariance, and it should give at least a 
qualitative idea of the effect of imposing rectangular priors covering the allowed parameter ranges. Results are shown 
in the upper section of Table HI and are as follows. 

The first line of Table [J shows the 4pp no- prior Ict values for the single-parameter rms errors (i.e., the square roots 
of the diagonal elements in the covariance matrix). Among these, AMc and A77 seem reasonable, but we get hung 
up on the value of A0o- Can the Ict error region be larger than the physically meaningful range for this angle? On 
general grounds, we should worry that the LSA cannot know that the waveforms are exactly periodic (and therefore 
nonlinear) in the angular parameters, so it blithely extrapolates small-angle effects to infinite ranges. However, as 
pointed out by Cutler [22], this extrapolation is roughly correct for a simple complex phase such as (j)Q [see Eq. 
(HUl) ]. for which the main correlated-parameter effect is to absorb the global phase shifts due to changes in the other 
parameters. A large Ac^o indicates that this absorption can happen through several cycles of phasing. We conclude 
that (f>Q is essentially undetermined, but we have no reason to distrust the errors for AMc, A77, and Atg. 

Applying a prior to 77 does not change the picture significantly, but priors do matter once we add the spin parameters, 
which are very poorly determined at this SNR. In the 5pp, we find unphysically large errors for both f3 and 77, which 
are cured only by imposing priors on both parameters. In the 6pp, we find that a prior is needed also for ct; adding it 



In particular, <l>o is strongly coupled to to, which produces frequency-dependent phase shifts through the exponential exp(27rj/to). 
Adopting the new phase parameter (j>'g = (po + Svr/oto, where /o is the dominant frequency at which f^'^^^/Snif) is maximum, largely 
removes this coupling [22| . 
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engenders measurable changes in AMc and A?]. As a rule of thumb, we should expect such effects whenever the signal 
derivatives show significant correlations, and when the magnitudes of the priors, measured crudely as the squared 
inverses (9™'^^ — 9™™)^"^ of the effective parameter ranges induced by the priors, are comparable to the corresponding 
diagonal Fisher-matrix elements Fa. 

B. Testing for the influence of priors (exact priors) 

We can perform an even better test by evaluating the effects of exact priors while still working in the LSA. Doing 
this requires some numerics, which are however very manageable on a workstation-class system. The idea is to 
integrate {A6iA6j)p as a Monte Carlo sum, which can be accomplished as follows. First, we need to fix a reference 
experiment by drawing the random variable Uj = (n, hj) from its ensemble distribution, which in Gaussian noise is a 
normal distribution with mean zero and covariance matrix Fij = (hi, hj) [see Eq. ([3])]. To do so, we generate a zero- 
mean, unit- variance, normal iV-tuple, and multiply it by y/Fij (where the square root is taken in the linear-operator 
sense and exists for nonsingular Fisher matrices). Note that we cannot work with SNR- invariant expressions (e.g., 
normalized parameter errors 6i), since the priors set a scale for the strength of the signal. 

We can now draw samples distributed with the LSA likelihood p{s\9) oc exp{—{hj,hk)0j9i/2 + rijOj}. To do so, 
we generate zero-mean, unit- variance, normal A'^-tuples, multiply them by {F~j^Y^'^, and offset them by F^^n^ . 
We include the effects of priors (therefore obtaining a population {0^*^} distributed according to the LSA posterior 
probability) by going through the samples, and discarding each of them with a probability 1 — p{9)/ maxgp (for 
rectangular priors the probability of discarding is always or 1). The covariance matrix of the posterior distribution 
can then be computed from the surviving samples. 

We repeat this procedure for many different experiments (i.e., rij's), and take a frequentist average of the covariance- 
matrix components (or study their frequentist distribution). Again, the Bayesian interpretation of this entire procedure 
is as follows: we place the true signal at ^ = 0; we draw from the possible noise realizations according to their ensemble 
probability; and we compute the variance of the posterior distribution for each noise realization. If the priors are 
very restrictive compared to Fisher-matrix-only errors, we may find that we are discarding a very large percentage of 
the samples. To avoid this, we can incorporate a normal approximation to the priors in the probability distribution 
used to generate the samples (i.e., by multiplying normal variates by •\/[(/ii, hj) + Py]"-'-, and offsetting them by 
[{hi,hj) + Pij]~^n^), and then sieve the resulting samples with respect to oc p{9)e-^^^^^^^^^ instead of p{9). It is also 
possible to use rejection sampling [2^, as we did for the results reported in this section, or the Metropolis algorithm 
[2J] with the likelihood or likelihood-plus-NTC-prior as proposal distribution, and the full posterior as the target 
distribution. 

Applying the procedure outhned above to our (10 + 10)Mq system yields the results listed in the lower section of 
Table |T1 We adopt exact priors given by rectangular probability distributions covering the intervals [0, oo] for Mc, 
[0, 0.25] for r], [—8.5, 8.5] for /9, and [—5, 5] for a. Each quoted error is a frequentist average of 200 independent Monte 
Carlo estimates, each computed for a different realization of noise from an initial sampling of 10^ parameter sets, 
reduced to 5 x 10^-2 x 10^ samples after rejection sampling, depending on the estimation problem. 

The expected errors are significantly reduced compared to the NTC-prior estimates. These reductions stem mainly 
from the greater tightness of the rectangular priors, and are especially significant for for 77, for which the symmetric 
NTC prior is indeed very crude. The lesson is that we can use [{hi, hj) + Pij]~^ (i.e., the quick test) to decide whether 
priors are important, but we need something more sophisticated, such as the procedure described in this section, to 
gauge their effects accurately. Of course, this gain in accuracy may be only virtual if the LSA is not warranted for 
our problem. Deciding that question is the object of the next section. 

VI. THE UNBEARABLE LIGHTNESS OF SIGNAL TO NOISE 

As we have seen in Sec. [Ill the high-SNR and LSA limits coincide because larger signal strengths correspond to 
smaller statistical errors, which in turn imply that the linearized-signal expression ((22)) for the likelihood is more 
accurate. The equivalence of the two limits is manifest in the 1/A expansions such as Eqs. (|20p and (|29p . Indeed, 
Finn ^4] cautions that "it is important that the probability contours of interest (e.g. 90%) do not involve [errors] so 
large that the linearization of [the likelihood] is a poor approximation." 

In practice, given a family of waveforms and the true parameter values, we need to ask how high an SNR is needed 
for the limits to yield accurate expected errors. One approach involves comparing the Fisher-matrix results with errors 
computed at the next order in the 1/A expansions: in App. A 5 of Ref. 5], Cutler and Flanagan provide next-order 
formulas for the frequentist variance (although they do not apply them to the Fisher-matrix estimates in the same 
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article). In Sec. IVIII we provide the mathematicals tools to do so in our notation; we must however warn the reader 
that the calculation is rather cumbersome, except for simple waveforms, and the verdict is still not definitive: the 
smallness of a term in a series does not guarantee that the series is converging. 

A simpler approach involves working with the ratio r(6'. A) of the LSA likelihood to the exact likelihood to build a 
consistency criterion for the Fisher-matrix formalism. In this section we shall see that under reasonable conditions, 
the ratio r is given in logarithm by 

\\ogr{e,A)\ = {e,hj-Ah{0),0khk~Ah{0))/2, (47) 

where Ah{9) = h{6) — h{0), A is the signal strength, and 6 is the error (in a sense to be made precise shortly). 
Since h{6) = h{0) + Oihi + • • • , the product in Eq. ([TT]) represents the noise-weighted norm of the higher-than- 
linear contributions to h(9)^ expanded around the true source parameters. The idea of the criterion is to choose an 
isoprobability surface (say, the la surface), as predicted by the Fisher matrix, and then explore it to verify that the 
mismatch between the LSA and exact likelihoods is smaller than a fiducial value (say, | logr| < 0.1), so that we can 
actually believe the LSA in predicting the la surface to begin with. 

We stress that this is just a criterion of consistency. Even if the Fisher-matrix result is internally consistent, it may 
still be inaccurate; conversely, the structure of the ambiguity function across parameter space could conspire in such 
a way as to make the LSA results correct, although we have no reason to expect that in general. In the rest of this 
section, we explain how the criterion comes about in the frequentist (Sec. IVI A[) and Bayesian (Sec. I VI Bp frameworks, 
and we show a concrete example of the criterion in use fSec. IVI C|) . 

A. Frequentist justification of the maximum-mismatch criterion 

As in Sees. UllandlVl we assume that the detector output is s = AJiq + n. In the LSA, the ML estimator is a 
normal variable with mean zero and covariance matrix (1/A'^){hj, hk)~^ [Eq. ([3])]. For a given signal amplitude A, let 
take on the specific value O^"' on its la surface. From Eqs. U), ((22)l . and (|47p . the mismatch ratio r is given by 

ri9'^) = exp {-(s - Ai'ho + 0j%),s - A{ho + 0l''hk))/2} / exp {-{s - Ah(0^^),s - Ah{0^'')) /2] ; (48) 

writing s out, we eliminate all instances of ho: 

\ogr{0^'') = -A^0Y 0l" {hj,hk) /2 + A^ {A^0^''), Ah{0^'')) /2 + A0Y {hj,n) - A{Ah{0^''),n). (49) 

The first two terms in the exponent can be computed given 0^'^; not so the two products involving n. To obtain 
the first, we note that if 0^^ = 0^" , then the noise must be such that 0^'^ = {1/A)(hj,hk)~^(hk,n) [Eq. so 
{hj,n) = A{hj,hk)0l'^- obtain the second, we change our perspective slightly, and average logr(0^°') over all 
noise realizations n compatible with 6'^°'. This is how. Let xj = (hj,n), y = {Ah{0^"'),n): separately, xj and y 
are normal random variables with mean zero and covariances equal to (hj, hk) and (A/i(6'^°'), Ah{0^'^)), respectively; 
taken together, they are jointly normal variables with covariance (hj, Ah{0^'^)) [Eq. ([3]) again]. We now know enough 
to build p{x, y), from which we can derive the conditional probability p{y\x) and compute the conditional mean of y, 
which (after the algebra of App. E]) turns out to be A0Y ihj, Ah{0'^'^)). Altogether, we find 

{\ogr{0^'^))^^e^.)^ A^0Y0l''{h,M)/2 + A'{AK0^n,AH0^n)/2- A^0 
^ {0]-hj-Ah{0'-),0l''hk-Ah{0'-))/2, 

just as anticipated in Eq. (|47|) . The signal strength A enters Eq. ([50]) explicitly, but also implicitly through the 
parameter width of the Fisher-matrix la surface. Thus Eq. (|T7|) can be solved for the A that corresponds to 0^"' 
small enough to yield r as close to unity as desired. Since to leading order 0^"' — Ah{0^'^) — Ahjk0j'^0l'^ , and since to 
leading order 0^"' scales as 1/A, we expect logr to scale as l/A^ for large enough A. 

In summary, the maximum-mismatch criterion is justified from a frequentist viewpoint as a constraint on the ratio 
r at points on a constant-LSA-probability surface, averaged over all realizations of noise compatible with finding the 
ML estimator at those points. 

B. Bayesian justification of the maximum-mismatch criterion 

The justification of the maximum-mismatch criterion from a Bayesian viewpoint requires another slight change of 
perspective. Again we assume s = Ahg + n; this time, however, we expand the waveform not with respect to the true 
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parameters {6 = 0), but to the observed location W^^{n) = 9q of maximum LSA likelihood for a given experiment. 
In the absence of priors, it is with respect to this location that the uncertainty of the posterior would be judged in 
a single experiment. Thus we write h{9) = /i(6'q + 0") ~ /iq + where Hq = /i(6'o)- the """ superscripts serve 

to remind us that the parameter displacements 6*" (and the waveform derivatives /i") are evaluated from (at) 9q. We 
also write A/ig for ho — h^. 

The la surface over which we are going to evaluate the ratio r will be a surface of equiprobable true-signal locations, 
given the observed location 9q of maximum LSA likelihood. In the LSA, the distribution over experiments of the 
true-signal location with respect to 9q is again normal with covariance matrix Thus we have 0" = —9q, 

and the mismatch ratio r is given by 

r{9o) = exp {-(,s - Aih^ + Ofh]), s - Aih'S + e]:K))/2} j exp {-[s - Aho, s - Aho) } ; (51) 

writing s out, the denominator reduces to exp— (n, n)/2, and ho enters the numerator only through A/iq: 

logr{9o) = -A^^9^{hlK)/2- A^Ah^,Ah^)/2 + A'9^{h^,Ah^) + A9^{K,n) - A{^^^^ (52) 

Now, since the LSA likelihood can be written as ^(516*") cx exp-(n + A/iJf - 9fhf,n + AhJ^ - 9^h])/2, the 
ML equation dp/d9f — at 9f = implies {h^,n) — — (/i",A/iq). We handle the last term of the equa- 
tion by evaluating the conditional mean of = —A{Aho,n) given x" = {h^,n) = — A(ft,", A/iq ), producing 
~A{hJ,n)(h],h'^)-'^{hl,Ahl^) = A{^ ,n){h'^ ,h1)-^{h'^,n) (again, see App. E|. We can then use Eq. (USD to re- 
place {h" , h^)~^ (h^ , n) with 9^^ = —9" (working to leading order), so that the last two terms of Eq. ([52]) end up 
canceling out: 

\ogr{9o) = -A'9'p-{^rK)/2 + A'9-Ch],AK) ~ A'{^h'S,^K)/2 
^^{9'^h]-Ah{9''),9'^h'^-Ah{9"))/2. 

Again, this equation can be solved for the A that corresponds to la true-signal locations 0" small enough to yield 
r close to unity. Interestingly, the signs of the frequentist and Bayesian expressions ([50)) and are opposite, 
indicating (at least prima facie) that the likelihood is overestimated in the frequentist case, underestimated in the 
Bayesian case. Given the conditions under which we have obtained Eqs. (|50p and (j53p . it is perhaps best to consider 
only their absolute value as rough indicators of the appropriateness of the high-SNR/LSA limit. 

In summary, the maximum-mismatch criterion is justified from a Bayesian viewpoint by fixing the location of 
maximum LSA likelihood, and then exploring a surface of equiprobable true-signal locations, evaluating for each the 
average of logr over all experiments (i.e., realizations of noise) compatible with having the true signal there. 



C. Practical usage of the maximum-mismatch criterion 

In both the frequentist and the Bayesian pictures, Eq. (|T7)) yields the noise-averaged logarithm of the likelihood 
mismatch, | logrj, as a function of the signal strength A and of a direction in parameter space that identifies a point 
on the la surface, given by the solutions of the LSA equation A^{hj,hk)9j9k = 1, and interpreted as equiprobable 
locations for the ML estimator given the true signal = (in the frequentist picture), or for the true signal given 
the mode of the likelihood aX 9 = (in the Bayesian picture). We use Eq. ([47]) by fixing the signal strength to 
what is reasonably expected in observations, perhaps close to the minimum detection SNR, although the astronomical 
distribution and intrinsic strengths of sources may prompt other choices (e.g., the the supermassive-black-hole binaries 
to be observed by LISA have typical SNRs in the hundreds); and then by evaluating | logrj as a function of direction 
in parameter space. 

Figure [T] shows an example of this procedure for a very simple and benign one-dimensional estimation problem (a 
sinusoid of known amplitude and frequency in Gaussian stationary noise) , where the only parameter left to estimate 
is the initial phase (po {— for the true signal). For each value of SNR = A, the expected la surface consists of just 
the two points (p}f — 1/A. Figure [1] shows | logrj as a function of and therefore of A. If we set a threshold of 
j logrj = 0.1 (the dashed line) to claim the high-SNR/LSA limit as consistent, we see that the consistency criterion 
is not satisfied for A — 1, where j logrj ~ 0.12, but it begins to be satisfied for A > 1.09. Once again, for a given 
SNR, j logrj at is an index of the closeness of the LSA and exact likelihoods at a typical values of the errors, and 
averaged among compatible noise realizations. 

The principle is the same for multiparameter estimation problems, where we have the additional task of sampling 
the entire la surface in a manner consistent with the LSA distribution at la. One way to do so is to obtain the 
eigenvalues A'^'-' and eigenvectors ^j*"* of (hj, hk), and then sample the parameter values 9 = X](I)=i c'^^^9'^^'^ /{AV)^), 
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-1.0 -0.5 0.0 0.5 1.0 

FIG. 1: Consistency criterion for the simple waveform model h[(j)o) = A cos{2tt ft + (j)o) in Gaussian stationary noise, with fixed 
(known) A and /. The curve plots | logr| as a function of the la error (j)})'' — ±1/A, with specific values of A called out by the 
circles. For the threshold | logr-| = 0.1, consistency is achieved for A > 1.09. (To generate this graph, the integration time was 
set to 1000//, and the variance of noise adjusted so that {h{(j)o), /i(0o)) = 1-) 
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|log r\ 



FIG. 2: Cumulative distribution function for |logr-| on the la surface at various SNRs for our reference SPA model with 
nil = ni2 = lOA^Q. The SNR required to have 90% of the la points at | logr| = 0.1 (dashed lines) increases considerably (in 
fact, to unrealistic values) as we move from the 4pp to 5pp and 6pp. This figure was produced without imposing any priors on 
the source parameters. 

with c^*-' distributed uniformly on the iV-dimensional unit sphere. We then obtain the cumulative distribution 
function for the values of | logr|, which we plot in Fig. [5] for our reference model. If we consider the high-SNR/LSA 
limit to be sufficiently realized when |logr| < 0.1 over 90% of the la surface, we conclude that the Fisher-matrix 
formalism (with no priors) is self-consistent for SNRs between 10 and 20 in the 4-parameter problem, between 100 
and 200 in the 5pp, and between 4000 and 10000 in the 6pp. 

The eigenvector directions that push the required SNR toward higher values are usually those associated with 
the smallest-magnitude eigenvalues. To confirm this, and to get some clues about the beyond-LSA structure of the 
likelihood, we can fix the maximum acceptable value of | logr| (say, again to 0.1) and then solve Eq. (|47)) for A as a 
function of direction in parameter space. We do so for the 4pp in Fig. [31 where we show all two-dimensional parameter 
subspaces along pairs of eigenvectors (strictly speaking, were are not sampling a single la surface, but considering the 
set of such surfaces for all SNRs, and determining on which of them | logr| = 0.1, as a function of parameter angle). 



To see why this is the right thing to do, consider the integration of a function against the LSA distribution, and make a change of 
variables (with unit Jacobian) to eigenvalue components, and a second to rescaled components c''^ = AV A(')c'*' : 

|(. . .) e-^^»^».{^..^)/2d0 = |(. . .) e-^' S« [cW] V2rf, oc |(. . .) e- l^"'^'/^dc; 

we see that the source parameters that correspond to c lying on a sphere of fixed radius must lie on an isoprobability surface. To 
reassemble 6 from the c, we need to divide the eigenvectors by AV~\^. 
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FIG. 3: SNR values at which | logr| =0.1 in the 4pp for our reference SPA model with mi — m2 — IOMq. The six subplots 
display sections of parameter space corresponding to all distinct pairs of eigenvectors of the Fisher matrix; the polar radius 
of the curves shows the required SNR (plotted logarithmically from log SNR = 0), while the polar angle is computed between 
pairs of renormalized eigenvector coefficients c'*'. The graph at the bottom right shows the composition of the four Fisher- 
matrix eigenvectors in terms of the source parameters, as well as their respective eigenvalues. This figure was produced without 
imposing any priors on the source parameters. 

In the Bayesian framework, it is also possible to combine the maximum-mismatch criterion vi^ith the normal prior 
test of Sec. IV Al by investigating the values of | logr| on the la surface given by the solutions of the LS A- cum- prior 
equation [(/ij, hk) + Pjk\OjOk = 1. This test can help decide whether the LSA is warranted once priors are factored in: 
it can be shown that Eq. continues to hold with NTC priors, although its interpretation is not as clean, because 
their mode moves around the la surface as we explore it. The maximum-mismatch criterion may indicate that, at 
the signal strengths of interest, the LSA becomes consistent only with the priors; in that case, the reliable predictor 
of source parameter accuracy would not be [{hj,hk) + Pjk]~^ (given the crudeness of NTC priors), but rather the 
result of an LSA Monte Carlo procedure such as that described in Sec. IV Bl 

It is very hard to make a general statement about the errors in the expected accuracies when the LSA Fisher-matrix 
result is not self-consistent. Such errors are strongly SNR-dependent, and it is usually necessary to include parameter 
priors into consideration. As anecdotal evidence, I offer that for our reference model at SNR — 10, a full-blown Monte 
Carlo sampling of the posterior distribution, involving an explicit time-domain realization of noise and adopting the 
priors of Sec. lVBl reports posterior variances that differ from the last three rows of Tab. [T] by few tens percent for the 
4pp, and by factors of a few (for Mc and rj only, since A/? and Act are dominated by the priors) for the 5pp and 6pp. 

In conclusion, I submit that graphs like those of Fig. [2] can be useful to assess the consistency of the "straight" 
Fisher-matrix formalism, are easy to produce with little additional machinery, and should be included in all articles 
that use the formalism to predict the future parameter-estimation performance of GW observations. If a single number 
must be quoted, it could be the SNR at which 90% of the Ict surface yields | logr| < 0.1. 

VII. BEYOND THE LINEARIZED-SIGNAL APPROXIMATION 

In this section we develop mathematical tools to derive higher-than-LSA expressions for the frequentist mean and 
variance of the ML estimator over an ensemble of noise realizations (Sec. IVII A[) . and for the Bayesian mean and 
variance of the posterior distribution (without priors) in a single experiment (Sec. IVII Bl) . These expressions provide 
corrections to the Fisher-matrix result, and can therefore be used to check its accuracy, as suggested by Cutler and 
Flanagan [5], who derive a general expression for the 1/A'^ correction to the frequentist variance. A formal treatment 
of the 1 /A expansion for the frequentist moments can be found in Barndorff-Nielsen and Cox [2^ and in Zanolin, 
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Naftali, and Makris [2g|, who use the expansion to determine conditions for the ML estimate to become unbiased and 
attain the Cramer-Rao bound [l^l • 

However, computing higher-order corrections involves a considerable amount of tensorial algebra that calls for 
the use of specialized software, such as MathTensor [2^; they also involve higher-than- first derivatives of the 
waveforms and products of several inverse Fisher matrices, which may raise concerns about the numerical accuracy 
of the computations. Throughout this section, we distinguish between covariant and contravariant indices (as in 
ni = (hi, n) and 0*, respectively); in fact, we find it convenient to use the inverse normalized Fisher matrix (/i% h^)~^ 
to raise indices, therefore hiding its repeated appearance in tensor expressions. 



A. In the frequentist framework 

Using the 1/A expansion of Eqs. (fT8|) and (fT9|) . the perturbative ML equations can be written in general as 

with = {n,hi)/Ol, Nij = (hij,n)/ll, Nijk = {hijk,n)/2\ (and so on), and 
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(55) 

Hijki = 



where hij = {hi,hj), hijk = {hi,hjk) (and so on), and where the two factorials at each denominator are those 
(respectively) of the number of indices before the comma minus one, and of the number of indices after the comma. 



Also, the of Eq. ([5^ arc the unknown l/A" contributions to 6 (as in Eq. p^ . dropping hats for simplicity), 

'(2) 



while the multi-index parameter objects such as Ot^^ are given by 



njk _ nj nk njk _ nj nk i nj ak njk _ nj nk i nj nk i nj nk 

'^(2) — '^(1)^(1)' '^(3) ^ '^(1)'^(2) + '^(2)'^(1)' '^(4) ^ ''(1)''(3) + '^(2)'^(2) + '^(3)'^(1)' 



^(3) - ^(i)^a)^(i)' ^(4) ~ ^(i)^a)^(2) + ^(1)^(2)^(1) + ^(2)^a)^(i)' (^6) 

njklni _ r.j nk nl am 
V) - '^(l)'^(l)^l)^l)' 

and so on. In general, the object will consist of as many addends as there are partitions of n into m integers, 

including all permutations of each partition. For instance, the n — b, m = i object would have terms for each of 
the partitions 1-l-l-h 3, l-h3-hl, 3-hl + l, 1 + 2-^2, 2 + 14-2, 2 + 2 + 1. 

The solution of each equation in Eq. (|54p is trivial given the solutions of all equations of lower order. Since the 
inverse matrix {H~^Y^ = {hi,hj)^^ = A^F^^ appears multiple times in the solutions (because Hij multiplies the 

unknown 0^^^ in each equation), it is convenient to adopt a compact notation that hides the {H~^y^ by raising every 
index into which they are contracted. We then find 

^(2) = ^^-^^i) - - N'^.N^ - H^,kN=N\ ^^^^ 

^(3) = ^'i^(2) + ^'ife^(i)^a) ~ jk{0li)0\2) + ^(2)^a)) ~ jkiO[^)0\^)0\i) = ■■■ 
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The frequentist mean and covariance of the 9^ can be buih from these sohitions, remembering the Wick-product 
rule [1^ for the ensemble average of products of Gaussian variables: 



((a, n))„ = 0, 
{{a,n){b,n))n = (a, 6), 
{{a,n){b,n){c,n))n = 0, 
{{a,n){b,n){c,n){d,n))n = {a,b){c,d) + {a,c){b,d) + {a,d){b,c) 



(58) 



(for any signals a, b, c, and d), where all the products with an odd number of factors vanish, while the products with 
an even number of factors are given by the sum of terms corresponding to all distinct pairings of signals into inner 
products. Thus we find that all the (^odd (*:))" vanish, while the first non-zero correction to {0^)n is 



(59) 



(1))« + ^(^(1)^(3) +^(2)^(2) 



!))„ + 



As for the covariance, 

{9^6^)^ ~ {9')n{9')n = 



(60) 

where all the stricken-through terms vanish because they are proportional to ensemble products of an odd number of 
n terms. The surviving contributions are given by 

(^(2)^(2))„ = (iV*fcA^''A^\.^™)„ - i/\.,(7V'-7V'iV^™iV")„ - iJ\,,(iV"7V«iV\.iV'=)„ + WMW^^^{N''N'N'"'m)^^, 

(61) 

and of course (^(3')^(i'))n = (^(i)^(3))n- The first of these equations reproduces the standard Fisher-matrix result. The 
four-iV products in Eq. ((6T1) follow from Eq. (|58p . For instance, the last two products are given by 



(62) 



These expressions can be substituted into those of Eq. , and those into Eq. (pD|) , yielding the frequentist variance 
to order Unfortunately, this requires computing second- and third-order waveform derivatives (the latter for 

Hjklm ) • 



B. In the Bayesian framework 



To generalize Eq. (I34p . we write 



and find the recurrence relation 



9>(0) 



d9. 



(63) 



(64) 
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which we may prove by expanding the identity 1^°^ +eX(^) + ^I*-^-* + • • • ^ 
on both sides as a series of e, leading to 

j=Q \ / 

whence Eq. To obtain all needed derivatives with respect to e, we rewrite Eq. (P^)) as 



p(s|0) oc exp| — {n,n)/2 ' 



(65) 



(66) 



where Ni — {n,hi)/V. — Ui, Nij — {hij,n)/2l ~ nij/2, Nijk — {hijk,n)/3l (and so on), and also the Hj_^...j^ have 
slightly different denominators than the Hj-^...j^ of Eq. ([SS]) : 



^^3k - i!2! 2! 1! ' 

Tji _ ^'i'^jkl hij,kl ^ijkjl 

ijki = + 2! 2! 3! 1! ' 



(67) 



namely, the denominator is m! l\ for the product /iji - - = (^ji - j„: Expanding as a series of e yields 

p{s\e) oc e"^.'=»'»V2+«.9^ X |l + e(iV,fc^^^^ - ^^H'^J'^') 



so that the I^"^ and A/"*-"-* are given by expressions akin to 

Now, the integrals of the general form 

can be computed with the Wick identity^^ [2^ 

(F(0-))^°) = exp{n.(i7^^r^n,/2}; 
in particular (again using (iJ'-')^^ to raise indices). 



(68) 
(69) 

(70) 
(71) 



(72) 



Another way to organize this computation is to offset the integration variable 0^ to 9^ ~ (H^^) = 9^ — in Eq. ((TO}, obtaining 

(en . . . = J {B^i + n'l) ■ ■ ■ (e"'- + n"")e-f^.'=«'«''/2d0 J j e-^o"^' ^^d9; 

we can then expand the product in the integrand, bring the n'l' outside the integral, and apply Wick's theorem [Eq. 11581 ] to obtain 
each addend of the form 

all integrals with odd I are zero, while the integrals with even I are given by the sum of all possible pairings of indices into products of 
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Unfortunately, the l/A^ (i.e., e^) corrections to the variance turn out to be rather unwieldy, and belong in 



a 



symbolic-manipulation software package rather than on these pages. We content ourselves with the l/A^ correction 
to the posterior mean (remember that the normalized parameters carry an A), 



n k-n 



kl 



(73) 



and the \/A^ correction to the variance, 



- n'n^n'^i fhj + h''^^ + i/ifcj^n'n™)] + 0{e^). (74) 



Thus we see that the contribution to the variance does not vanish in any single experiment (unless = 0). 

It does vanish, however, under frequentist average, since it involves products of odd numbers of noises. 



VIII. CONCLUSION 



In this article I tried to provide, as it were, a user's manual for the Fisher information matrix. It seems clear 
that the Fisher-matrix formalism will continue to be featured prominently in research dealing with the parameter- 
estimation prospects of future GW observations, because of its compactness and accessibility, and because of the 
difficulty of computing higher-order corrections and running full-blown simulations. Yet the three questions posed in 
the introduction loom over the credibility of Fisher-matrix results, which is all the more worrisome when these results 
are used to justify choices in science policy or experiment design. 

The recipes provided in this paper to answer the initial questions can help assert (or falsify) the accuracy of the 
formalism for specific signal models. In particular: 

1. As discussed in Sec. IIVI ill-conditioned or singular Fisher matrices point to the need for increased numerical 
accuracy, and occasionally to a case for discarding a parameter or combination of parameters, but more often to 
suspicions about the appropriateness of the high-SNR/LSA limit. Section HV Al describes how to use the singular 
value decomposition of the Fisher matrix to discard truly degenerate linear combinations of parameters; Sec. 
IIVBI describes how to roughly assess the sensitivity of the Fisher- matrix inverse to numerical error by means of 
the Fisher-matrix condition number, and more carefully by a simple Monte Carlo test. 

2. The necessity of including prior distributions for the source parameters, perhaps in as simple a form as uniform 
distributions over the physically allowed ranges, can be roughly assessed by verifying whether Fisher-matrix 
results change with the addition of simple Gaussian priors, as shown in Sec. IV Al more accurate estimates of 
the effect of priors can be obtained by integrating the variance of an exact-prior-LSA-likelihood posterior with 
the simple Monte Carlo algorithm of Sec. IV Bl 

3. The detected-signal strength (i.e., the SNR) necessary for Fisher-matrix results to be internally consistent can 
be evaluated with the likelihood-mismatch criterion that follows from Eqs. ([50]) and ([55]) of Sec. IVII or (at the 
price of some algebra) by computing the higher-order corrections presented in Sec. IVIII 

If the Fisher-matrix formalism remains inconsistent at the SNRs of interest, even with the help of priors, there is little 
recourse but to embark in explicit Monte Carlo simulations of frequentist Q or Bayesian 2] parameter estimation. 
Such simulations can consistently include sophisticated priors, and explore the secondary maxima of the posterior (or 
likelihood, in the frequentist case). They are the gold standard of this trade, but as such they are expensive in human 
effort and CPU resources. The recipes given in this paper can help establish when they are truly needed. 
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APPENDIX A: LEMMA FOR THE CONDITIONAL AVERAGE OF JOINTLY NORMAL RANDOM 

VARIABLES 



Assume the vector Xj and the scalar y are jointly normal random variables with mean zero and covariance matrix 

C = I I . (Al) 

\H, G J 

From the standard Frobenius-Schur formula for the inverse of a block matrix [30l | , 



A b\ ^( A-^ + A-^BS^^CA-^ -A-^BS^^ 



(A2) 



(with Sa = D ~ CA ^B the Schur complement of A), we find 

(A3) 



iKk Hk) 



since in our case F^^ ^ is symmetric and Sa is the scalar G — {F-^ ^HiHj). Now, the joint distribution of Xj and y is 
given by 

p(a;,y)cxexp-|(x, y).C-i- (A4) 

while the conditional distribution of y given Xj is p{y\x) = p{x, y)/p{x) = p(x, y)/ [J p{x, y) dy] . Since however p{x) 
can be a function only of x, by the properties of Gaussian integrals it must be that p{x) cx exp (• • • )ijXiXj. It follows 
that p{y\x) must be of the form 

p{y\x) cx exp - „ 2S^\x,F^^' H,)y + (• ■ ■),,x,Xj} / 2, (A5) 

from which, by inspection, we conclude that 

(2/)x. = J yp{y\x) dy = x,F^r.^H, (A6) 

and that 

var^.y ^ Jiy- (y>.jV(2/k) dx = SA = G- Fr^^H^H,. (A7) 



[1] E. T. Jaynes and G. L. Bretthorst (ed.). Probability theory: the logic of science (Cambridge University Press, Cambridge, 
2003). 

[2] See, e.g., N. Christensen and R. Meyer, Phys. Rev. D 58, 082001 (1998); N. Christensen, R. ,]. Dupuis, G. Woan, and R. 

Meyer, Phys. Rev. D 70, 022001 (2004). 
[3] D. Nicholson and A. Vecchio, Phys. Rev. D 57, 4588 (1998). 
[4] L. S. Finn, Phys. Rev. D 46, 5236 (1992). 
[5] C. Cutler and E. E. Flanagan, Phys. Rev. D 49, 2658 (1994). 

[6] R. Balasubramanian, B. S. Sathyaprakash, and S. V. Dhurandhar, Phys. Rev. D 53, 3033 (1996). 
[7] R. Balasubramanian and S. V. Dhurandhar, Phys. Rev. D 57, 3408 (1998). 

[8] L. A. Wainstein and L. D. Zubakov, Extraction of signals from noise (Prentice-Hall, Englewood Cliffs, NJ, 1962). 

[9] A. V. Oppenheim, A. S. Willsky, and I. T. Young, Signals and systems (Prentice-Hall, Englewood Cliffs, NJ, 1983). 
[10] S. M. Kay, Fundamentals of statistical signal processing: estimation theory (Prentice-Hall, Englewood Cliffs, NJ, 1993). 
[11] G. L. Bretthorst, Bayesian spectrum analysis and parameter estimation (Springer- Verlag, New York, 1988). 
[12] A. Abramovici et al.. Science 256, 325 (1992). 

[13] T. Damour, B. R. Iyer and B. S. Sathyaprakash, Phys. Rev. D 63, 044023 (2001); 66, 027502 (2002). 
[14] E. Poisson and C. M. Will, Phys. Rev. D 52, 848 (1995). 



24 

[15] K. G. Arun, B. R. Iyer, B. S. Sathyaprakash, and P. A. Sundararajan, Phys. Rev. D 71, 084008 (2005); erratum, 72, 

069903 (2005). 

[16] L. Blauchct, T. Damour, G. Esposito-Farcsc, and B. R. Iyer, Phys. Rev. Lett. 93, 091101 (2004). 
[17] L. E. Kidder, C. M. Will, and A. G. Wiseman, Phys. Rev. D 47, R4183 (1993). 

[18] See, e.g., A. K. Gupta and D. K. Nagar, Matrix Variate Distributions (Chapman and Hall/CRC, Boca Raton, FL, 2000). 
[19] A. Tarantola, Inverse problem theory and methods for model parameter estimation (SIAM, Philadelphia, PA, 2005). 
[20] G. Golub and C. van Loan, Matrix computations, 3rd ed. (Johns Hopkins Univ. Press, London, 1996). 
[21] E. Berti, A. Buonanno, and C. M. Will, Class. Quant. Grav. 22, S943 (2005). 

[22] C. Cutler (private communication). 

[23] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in C (Cambridge University Press, 
Cambridge, 1988). 

[24] N. Metropolis et al., J. Chem. Phys. 21, 1087 (1953); W. K. Hastings, Biometrika 57, 97 (1970). 

[25] O. E. BarndorfT-Nielsen and D. R. Cox, Inference and Asymptotics (Chapman and Hall, London, 1994). 

[26] M. Zanolin, E. Naftali, and N. C. Makris, in preparation. 

[27] E. Naftali and N. C. Makris, J. Acoust. See. Am. 110, 1917 (2001); A. Thode, M. Zanolin, E. Naftah, I. Ingram, P. Ratilal, 

and N. C. Makris, ibid. 112, 1890 (2002). 
[28] L. Parker and S. M. Christensen, MathTensor: A System for Doing Tensor Analysis by Computer (Addison- Wesley, 

Reading, MA, 1994). 

[29] J. Zinn-Justin, Path Integrals in Quantum Mechanics (Oxford University Press, Oxford, 2005). 
[30] E. Bodewig, Matrix Calculus (North-Holland, Amsterdam, 1959). 



